QUICK: Expressive and Flexible Search over Knowledge Bases and Text Collections

نویسندگان

  • Jeffrey Pound
  • Ihab F. Ilyas
  • Grant E. Weddell
چکیده

Recent work on Web-extracted data sets has produced an interesting new source of structured Web data. These data sets can be viewed as knowledge bases (KB) – large heterogeneous linked entity collections with millions of unique edge and node labels, often encoding rich semantic information over entities. For example, YAGO [5] and ExDB [2] have fact collections numbering in the tens and hundreds of millions respectfully, and WebTables [1] contains over one hundred million extracted relations. In terms of schema information, the ExDB, YAGO, and WebTables data sets all have schema items numbering in the millions. Due to the sheer size of the schema information in these Web-extracted KBs, there has been limited development of novel tools and technologies to expose these unique data sets to end users. One of the main factors impeding the development of useful applications over these data sets is the information overload problem caused by the massive heterogeneous schema information. With KB schema items numbering in the millions, formulating structured queries is a daunting task for an application developer. Most applications are therefore built on keyword search, which does not allow direct access to the expressive structure of the data, or are applications built on structured queries with the assumption that the users are able to digest these massive schemas. As an example, consider a user searching for all scientists that have won a nobel award. Exploiting the structure of a knowledge base, a user may formalize the information need as the following SPARQL query.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

QUICK: Queries Using Inferred Concepts from Keywords

We present QUICK, an entity-based text search engine that blends keyword search with structured query processing over rich knowledge bases (KB) with massive schemas. We introduce a new formalism for structured queries based on keywords that combines the flexibility of keyword search and the expressiveness of structures queries. We propose a solution to the resulting disambiguation problem cause...

متن کامل

Automatic extraction of facts, relations, and entities for web-scale knowledge base population

Equipping machines with knowledge, through the construction of machinereadable knowledge bases, presents a key asset for semantic search, machine translation, question answering, and other formidable challenges in artificial intelligence. However, human knowledge predominantly resides in books and other natural language text forms. This means that knowledge bases must be extracted and synthesiz...

متن کامل

Constructing Flexible Dynamic Belief Networks from First-Order Probabilistic Knowledge Bases

This paper investigates the power of first-order probabilistic logic (FOPL) as a representation language for complex dynamic situations. We introduce a sublanguage of FOPL and use it to provide a first-order version of dynamic belief networks.We show that this language is expressive enough to enable reasoning over time and to allow procedural representations of conditional probability tables. I...

متن کامل

A Distributed Digital Library Architecture Incorporating Different Index Styles

The New Zealand Digital Library offers several collections of information over the World Wide Web. Although full-text indexing is the primary access mechanism, musical collections can also be accessed through a novel melody retrieval system. In offering this service over a three-year period, we have had to face many practical challenges in building, maintaining, and administering diverse collec...

متن کامل

Big Scale Text Analytics and Smart Content Navigation

Identifying and exploring relevant content in growing document collections is a challenge for researchers, users, and system providers alike. Supporting this is crucial for companies offering knowledge in the form of documents as their core product. Our demo shows an intelligent way of doing guided research in big text collections, using the collection of the major scientific publisher Springer...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2010